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Abstract 

Innate immunity is a theme of increasing interest for HIV research. However, the term is overstretched 
to cover biological barriers, cellular systems, soluble factors, signaling pathways, and effectors and is 
inconsistently applied. A clearer semantic classification of the components of innate immunity is needed, 
which will have direct relevance to the interpretation of human genome variation. Here, we discuss 
genomic approaches that can assist in re-defining the perimeter of innate immunity. We place particular 
emphasis on the characteristics of effectors of the intracellular defense against HIV and other pathogens. 



Introduction 

Due to their collective significance in mediating the host 
response against pathogens, the genes of the interferon 
response have been an area of particular focus in the field 
of antiviral defense. This system includes the induction 
of several hundred interferon-stimulated genes. Under- 
standing the biology of interferon-stimulated genes is 
challenging because of the diversity in their specificity and 
breadth of action against pathogens. Among them, and of 
considerable interest in the field of HIV research, are the 
paradigmatic retroviral restriction factors TRIM5a, 
APOBEC3G, and BST2/Tetherin [1], as well as newly 
identified factors such as SAMHD1 [2,3] and SLFN11 [4]. 
A second challenge in studying interferon-stimulated gene 
biology is understanding the apparent lack of efficacy against 
HIV infection. During chronic infection, a strong inter- 
feron response does not correlate with lower levels of 
HIV viral load [5]. What's more, persistent production of 
interferon during chronic infection is thought to be 
deleterious [5-7]. This stands in contrast with the efficacy 
of exogenous administration of interferon that contributes 
to active control of HIV infection in vivo [8] . Adding to the 
challenge, antiviral responses can also be triggered by 
interferon-independent paths [9]. Understanding the 
protective and deleterious contribution of the innate 
cellular response to HIV needs a more complete under- 
standing of the components of such defense machinery. 



In this report, we highlight approaches from genomics 
that can help in these endeavors. The emphasis is on 
the components of intracellular defense; thus, we do not 
discuss non-cell autonomous systems that are typically 
considered to be within the innate framework (e.g. NK 
cells, macrophages, dendritic cells, etc.). 

Genes comprising innate immunity? 

There are multiple resources that list and categorize 
components of innate immunity. The Gene Ontology 
project (http://www.geneontology.org/) [10] includes the 
term "innate immunity response" (GO:0045087); Innate- 
DB (http://www.innatedb.ca) [11] identifies curated 
genes, experimentally verified protein interactions, and 
signaling pathways involved in innate immunity; and the 
interferon-stimulated gene database [12] identifies inter- 
feron-stimulated genes through expression analyses. 
Additionally, recent work compiled a list of interferon- 
stimulated genes that were used for extensive functional 
analyses in the context of viral infection, including HIV 
[13]. However, the overlap across these various lists is 
limited: of 1492 genes included in one or more of the 
above databases, only 25 are common to all four sets 
(Figure 1). The reasons for this lack of consensus are many 
fold: diversity of biology (innate immunity refers to 
activities spanning many molecular and cellular func- 
tions), methodological approaches for gene identification 
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Figure I . Overlap of four innate immunity gene sets 
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Venn diagram representing four sets of innate immunity human genes: i) In purple, 649 genes associated to GO term "innate immune response" 
(GO:0045087); ii) in green, 927 manually annotated genes from InnateDB; iii) In blue, 369 interferon-stimulated genes in the interferon-stimulated gene 
database; iv) in yellow, 424 interferon-stimulated genes genes compiled by Schoggins et al. [13]. 



(e.g. microarray, functional assays), experimental setups 
(e.g. different cell lines, stimuli, pathogens), and diverse 
levels of confidence in annotation. More recently, a 
number of initiatives have aimed at defining restricted 
sub-fields, such as that of the intrinsic cellular defense [14] 
and of cell-autonomous immunity [15]. Clearly, an effort 
of convergence is needed as experts agree that the "function 
of the several hundred genes has been comprehensively 
summarized to only limited extents" [16,17]. A number 
of recent papers have advanced evolutionary genomics, 
human genetic approaches, and large scale functional 
genomic screens that dissect components of innate 
immunity. 

The evolutionary view 

It is well accepted that genes and cellular pathways 
enriched for signals of positive selective pressure are 
frequently involved in the immune response [18,19]. The 
underlying concept is that evasion from, and co-evolution 
with, pathogens is one of the strongest evolutionary 
pressures, resulting in signals identifiable through com- 
parative genomics. It is expected that genes with such 
characteristics have an effector role, and that the signals 
will be most pronounced at domains of direct interaction 
with a pathogen [20]. Indeed, signatures of positive 
selection are enriched in the various sets of innate immunity 
genes (Figure 2A). The HIV restriction factors TRIM5a 



APOBEC3G, BST2 and SAMHD1 are relevant examples of 
genes that have undergone positive selection [20]. 

Gene expansion (Figure 2B) and, in particular in primates, 
segmental duplications [21,22] are prominent features of 
innate immunity genes. The resulting gene duplications 
may lead to increased gene dosage, and neo- or sub- 
functionalization [23]. The current state of functional 
annotation suggests that characterization of duplicated 
innate immunity genes is largely incomplete. For example, 
83% of the 927 genes in InnateDB have paralogs (genes 
emerging from duplication events). However, most of 
these paralogs have not themselves been annotated as part 
of innate immunity, despite many of them showing high 
levels of sequence similarity. Therefore, estimates of 
positive selection and patterns of duplication can help 
establish categories within innate immunity genes. As 
a corollary, these metrics could serve to annotate genes 
that have not been previously considered part of innate 
immunity. 

A human genetics view 

Exome and whole-genome sequencing in thousands of 
individuals have revealed large numbers of variants that 
change amino acid sequences [24]. Increasing numbers 
of non-synonymous variants are a feature of genes under 
positive selection, and of genes of innate immunity 
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Figure 2. Evolutionary pattern of innate immunity gene sets 




Primate dN/dS ratio Number of duplication events in ttie evolutionary history of a gene 

Panel (a) In grey, the genome-wide distribution (density) of dN/dS values - a measure of positive selection [34] - for 1 9252 protein coding genes in primates. 
Lines depict the distribution of dN/dS values for genes associated with the various innate immunity sets discussed in Fig I . Panel (b) Distribution of duplication 
events occurring during the evolutionary history of a gene across the various innate immunity gene sets. The histogram depicts the proportion of genes that 
have none, one or more duplications in the human genome. Dotted lines represent the duplication events for genes associated with the innate immunity gene sets. 
The values of those measurements for the prototypical innate immunity genes SAMHDI, BST2, TRIM5 and AP0BEC3G are indicated. 



Figure 3. Burden of human genetic variation in innate immunity genes 
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(a) Non-synonymous coding variants in 1 42 1 3 human genes in the 1 000 Genome Project are plotted according to estimates of positive selection in 
primates. The x-axis distributes genes from the most conserved (lower decile intervals) to the genes under positive selection (higher decile intervals). 
In red, innate immunity genes (n=l 1 43): the greater the signal of positive selection, the more frequent the identification of non-synonymous variants. 
In grey, the rest of human genes, (b) This trend is not observed for synonymous variants. Horizontal black lines represent median values for the 
protein coding genome. 



(Figure 3). In some instances, variants code for nonsense 
mutations that, when homozygous, may result in natural 
knock-outs. We estimate, on the basis of 1092 exome 
sequences from the 1000 Genomes Project [25], that 



around 10% of innate immunity genes carry a homo- 
zygous stop codon or frameshift variant that may lead to 
a loss of function. As a correlate to the discussion on gene 
expansion in the evolution of genomes, there is also 
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interest in the presence of copy number variation and, in 
particular, deletions - also enriched in human genes 
involved in innate immunity and the inflammatory 
response [26]. The interpretation of these data includes 
the possibility that greater genetic diversity provides a 
benefit to the species, i.e. through balancing selection. 
However, the high frequency of functional variation in 
innate immune genes could also represent the substrate 
of human susceptibility to infection - including the 
possibility of selective immunodeficiency [27]. Exome 
and whole-genome sequencing to understand rare human 
variation in the setting of HIV is an important research 
avenue that will complement the various genome-wide 
association studies that have been published in the field 
[28,29]. 

A functional view 

There is significant room for characterization of innate 
immune genes through the iterative combination of 
genomic and functional assays. Some commonly applied 
tools include silencing RNA and gain-of-function screens, 
large-scale co-immunoprecipitation of interacting host 
and pathogen proteins in cell lines, and phosphopro- 
teome studies [30,31]. However, it is broadly acknowl- 
edged that the interferon response is deficient in many 
laboratory cell lines - which explains their utility in 
pathogen research. This observation notwithstanding, the 
underlying integrity of the cellular innate immune system 
is rarely considered. For example, RNA sequencing of 
SupTl or 293T cells, highly permissive cell lines used in 
HIV research, shows that they are poorly equipped to 
respond to the incoming virus [32]. Between 25 and 50% 
of innate immunity genes are downregulated or not 
expressed in these cell lines, which stands in contrast with 
their level of expression in primary CD4+ T cells. Thus, 
analysis of expression in multiple cellular systems 
generates a checkerboard of innate immunity genes that 
are absent in one or more susceptible cell types, but 
present in cell lines or primary cells that do not support 
pathogen replication. This fact can be leveraged to further 
define the perimeter of innate cellular defense. Thus, 
genes of innate immunity that are missing in susceptible 
cell lines, and present in primary cells, can be considered 
as candidates for further investigation. 

Conclusions 

From a genomic perspective, a number of approaches are 
useful to characterize innate immunity and thus important 
to characterize the first barrier of defense against 
HIV. Positive selection, gene duplication, human genetic 
diversity, and differential expression across cell lines and 
primary cells are quantifiable features that point to a dif- 
ferential genomic landscape of effector genes participating 
in protection against HIV and other pathogens [33]. 
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